Skip to content

Core: Add freshness-aware loading to RESTTableOperations#16319

Open
yadavay-amzn wants to merge 1 commit into
apache:mainfrom
yadavay-amzn:fix/15109-rest-table-ops-freshness
Open

Core: Add freshness-aware loading to RESTTableOperations#16319
yadavay-amzn wants to merge 1 commit into
apache:mainfrom
yadavay-amzn:fix/15109-rest-table-ops-freshness

Conversation

@yadavay-amzn
Copy link
Copy Markdown
Contributor

@yadavay-amzn yadavay-amzn commented May 13, 2026

Summary

Adds ETag/If-None-Match support to RESTTableOperations.refresh() and commit(), so repeated refresh calls skip re-downloading unchanged table metadata.

Follow-up to #15109 (freshness-aware table loading in REST catalog).

Changes

When RESTTableOperations.refresh() receives an ETag from the server, subsequent refresh() calls send If-None-Match and return the current metadata on 304 Not Modified. The ETag is also captured from commit responses so the next refresh benefits immediately.

This reduces bandwidth and server load for repeated refresh calls (retries, transactions, reconciliation) without any API changes or cache coupling.

Scope

This is the minimal self-contained slice of #15109. Deliberately out of scope:

  • FileIO credential refresh (blocked by final field, needs design discussion)
  • Cache callback to RESTSessionCatalog (adds coupling, can be a follow-up)
  • Config/credential updates on commit (REST spec does not include them)

Testing

5 new tests in TestRESTTableOperationsFreshness:

  • ETag sent on second refresh
  • Current metadata returned on 304
  • No header sent without prior ETag
  • ETag captured from commit response
  • If-None-Match merged with existing read headers

@github-actions github-actions Bot added the core label May 13, 2026
When RESTTableOperations.refresh() receives an ETag from the server, subsequent
refresh() calls send If-None-Match and skip metadata re-download on 304 Not
Modified. This reduces bandwidth and server load for repeated refresh calls
(retries, transactions, reconciliation).

Closes apache#15109
@yadavay-amzn yadavay-amzn force-pushed the fix/15109-rest-table-ops-freshness branch from b2a2850 to b585f22 Compare May 13, 2026 17:13
@yadavay-amzn
Copy link
Copy Markdown
Contributor Author

@gaborkaszab This implements the freshness-aware loading follow-up you suggested in #15109. CI is green. Would appreciate a look when you get a chance.

@gaborkaszab
Copy link
Copy Markdown
Contributor

Thank you for the PR @yadavay-amzn !
What makes me slightly uncomfortable is that we now have 2 different ETags. One on the cache level, one on the RESTTableOps level, and in some occasions they can diverge

@yadavay-amzn
Copy link
Copy Markdown
Contributor Author

@gaborkaszab Thanks for the review! Good question on the two ETags.

The cache-level ETag (in RESTCatalog) gates whether to re-fetch the full table metadata from the server. The ops-level ETag (in RESTTableOperations) gates whether refresh() needs to re-download metadata it already has.

They serve different purposes:

  • Cache ETag: "has the table changed since I last loaded it?" (catalog-level freshness)
  • Ops ETag: "has the metadata changed since my last refresh?" (operation-level freshness for retry loops)

They can diverge temporarily (e.g., after a commit updates the ops ETag but before the cache is invalidated), but this is safe because both are optimistic -- a stale ETag just means an extra round-trip, never stale data.

That said, if you feel strongly about unifying them, I could propagate the ops-level ETag back to the cache on refresh. Would that address your concern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants